Evaluating automatic syllabification algorithms for English

نویسندگان

  • Yannick Marchand
  • Connie R. Adsett
  • Robert I. Damper
چکیده

• The three lexical databases • 18,016 words were both found in the Webster’s Pocket Dictionary and the Wordsmyth English Dictionary-Thesaurus. • These 2 independent dictionaries, each consisting of 18,016 syllabified entries, are referred as S&R and Wordsmyth, respectively. • A third database, Intersection, was derived consisting of the 13,594 words in the two above independent dictionaries with identical syllabification patterns. Institute for Biodiagnostics

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

We present the first English syllabification system to improve the accuracy of letter-tophoneme conversion. We propose a novel discriminative approach to automatic syllabification based on structured SVMs. In comparison with a state-of-the-art syllabification system, we reduce the syllabification word error rate for English by 33%. Our approach also performs well on other languages, comparing f...

متن کامل

Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian

Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-b...

متن کامل

Automatic syllabification in English: a comparison of different algorithms.

Automatic syllabification of words is challenging, not least because the syllable is not easy to define precisely. Consequently, no accepted standard algorithm for automatic syllabification exists. There are two broad approaches: rule-based and data-driven. The rule-based method effectively embodies some theoretical position regarding the syllable, whereas the data-driven paradigm tries to infe...

متن کامل

Are rule-based syllabification methods adequate for languages with low syllabic complexity? the case of Italian

Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-d...

متن کامل

Automatic word stress marking and syllabification for Catalan TTS

Stress and syllabification are essential attributes for several components in text-to speech (TTS) systems. They are responsible for improving grapheme-to-phoneme conversion rules and for enhancing the synthetic intelligibility, since stress and syllable are key units in prosody prediction. This paper presents three linguistically rule-based automatic algorithms for Catalan text-to-speech conve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007